A fast dendrogram refinement approach for unsupervised expansion of hierarchies

نویسندگان

  • Ricardo M. Marcacini
  • Everton A. Cherman
  • Solange O. Rezende
چکیده

Hierarchies are effective data models for organizing textual collections, particularly for automatic document classification into categories and subcategories. However, the majority of existing methods on hierarchical classification require human-labeled document set. Moreover, humans have good insight to manage the categories of higher levels of the hierarchy, i.e., more general categories, while the management of more specific categories is a difficult and expensive task since it requires expert knowledge to identify appropriate categories and their respective documents. Thus, in this paper we introduce an approach to automatically expand new, and more specific categories from a reduced initial hierarchy, which contains only general categories. Our approach is based on text clustering methods, particularly performing refinements on dendrograms obtained by hierarchical clustering algorithms. The results of the experimental evaluation show that the proposed approach achieves better performance in the expansion of hierarchies, compared with a traditional technique. Moreover, our approach is computationally faster, allowing the identification of new categories in large text collections.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

A Fast and Accurate Expansion-Iterative Method for Solving Second Kind Volterra Integral Equations

This article proposes a fast and accurate expansion-iterative method for solving second kind linear Volterra integral equations. The method is based on a special representation of vector forms of triangular functions (TFs) and their operational matrix of integration. By using this approach, solving the integral equation reduces to solve a recurrence relation. The approximate solution of integra...

متن کامل

Self-organized Reservoirs and Their Hierarchies

We investigate how unsupervised training of recurrent neural networks (RNNs) and their deep hierarchies can benefit a supervised task like temporal pattern detection. The RNNs are fully and fast trained by unsupervised algorithms and only supervised feed-forward readouts are used. The unsupervised RNNs are shown to perform better in a rigorous comparison against state-of-art random reservoir ne...

متن کامل

An information theoretic approach to hierarchical clustering combination

In Hierarchical Clustering, a set of patterns are partitioned into a sequence of groups represented as a dendrogram. The dendrogram is a tree representation where each node is associated with merging of two (or more) partitions and hence each partition is nested into the next partition. Hierarchical representation has properties that are useful for visualization and interpretation of clustering...

متن کامل

Fast Finite Element Method Using Multi-Step Mesh Process

This paper introduces a new method for accelerating current sluggish FEM and improving memory demand in FEM problems with high node resolution or bulky structures. Like most of the numerical methods, FEM results to a matrix equation which normally has huge dimension. Breaking the main matrix equation into several smaller size matrices, the solving procedure can be accelerated. For implementing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012